1

mrahmedcomputing

KS3, GCSE, A-Level Computing Resources

Lesson 4. Information Encoding Systems


Lesson Objective

  • Be able to convert letters into binary using ASCII.
  • Understand the purpose of Unicode.
  • Define the terms bit, nibble, byte, kilobyte, megabyte, gigabyte and terabyte using standard prefixes.

Lesson Notes

Bit Patterns

Bit patterns play a crucial role in representing various types of data. Whether it's text, images, sound, or integers, everything is ultimately translated into binary form-combinations of 1s and 0s. Let's explore how different data types are converted into these bit patterns:

Text (Characters):

When you press a key on your keyboard, it needs to be transformed into a binary number so that the computer can process it and display the corresponding character on the screen.

The ASCII code (American Standard Code for Information Interchange) assigns a unique binary number to each character. For instance:

ASCII code covers special characters, punctuation, return keys, control characters, as well as uppercase and lowercase letters. It can represent 128 characters, which suffices for most English words but falls short for other languages.

Images:

Images are typically represented as bitmap graphics. Each image consists of tiny squares called pixels.

The color of each pixel is encoded using binary values. For example:

Additionally, the color depth determines the range of colors an image can display. Higher color depth allows more vibrant and detailed images.

Sound:

Sound is captured through sampling. Analog sound waves are converted into digital samples.

The bit depth (number of bits per sample) affects sound quality. More bits provide greater accuracy.

Common audio formats (like MP3) use bit patterns to represent sound data.

Here some practical examples of file sizes:


Storing Values in Binary

With only 2 values (0,1), we need to understand how many values we can make with n bits.

“There are 2n possible variations with n bits”

If we had 2 bits we can arrange them in 22 or 4 different ways.

00, 01, 10, 11

Bits 2n Combinations
1 21 2
2 22 4
3 23 8
4 24 16
5 25 32
6 26 64
7 27 128
8 28 256
... 2...
16 216 65,536
32 232 4,294,967,296
64 264 18,446,744,073,709,551,616

Storage Prefix

Computers process and store large amounts of bytes, often in the order millions or billions.

When dealing with large quantities it is more convenient to summarise this using number prefixes.

A common example of this is the kilogram (kg) which is equivalent to 1000 grams (g).

When describing quantities of bytes we use either: Binary prefixes (powers of 2) or decimal prefixes (powers of 10).

Base 2 Binary (10, 20, 30 - ?)
Unit 2n Value
kibibyte (KiB) 210 1,024
mebibyte (MiB) 220 1,048,576
gibibyte (GiB) 230 1,073,741,824
tebibyte (TiB) 240 1,099,511,627,776
pebibyte (PiB) 250 1,125,899,906,842,620
exbibyte (EiB) 260 1,152,921,504,606,846,976
Base 10 Decimal (3, 6, 9 - x1000) - Metric
Unit 2n Value
kilobyte (kB) 103 1,000
megabyte (MB) 106 1,000,000
gigabyte (GB) 109 1,000,000,000
terabyte (TB) 1012 1,000,000,000,000
petabyte (PB) 1015 1,000,000,000,000,000
exabyte (EB) 1018 1,000,000,000,000,000,000

The same number prefixes for decimal values can be used to summarise large quantities of bytes.

Common prefixes include:

Traditionally computer scientists used these same number prefixes to refer to groups of bytes.

These are not the same as their decimal equivalents.

EXAMPLE???

To eliminate this confusion, in 1998 the International Electrotechnical Commission (IEC) established different prefixes to represent multiples of base 2:

Storage devices used binary base 2 numbers, so binary prefix is more accurate.


Units of Measurement

In the metric system, we have straightforward conversions: 1 kilometer (km) equals 1000 meters (m), and 1 kilogram (kg) equals 1000 grams (g). These powers of 10 make sense and are easy to remember.

However, when it comes to digital storage, things get interesting. Computers use binary (base 2) for everything. So, when they measure storage, they work in powers of 2. A kilobyte (kB) is commonly understood to be 1000 bytes. However, there's a historical twist.

In binary terms, a kilobyte is actually 1024 bytes (210). This discrepancy arises because 1024 is the nearest power of 2 that's close to 1000. To address this, modern terminology introduces the concept of a kibibyte (KiB). A kibibyte is precisely 1024 bytes, matching the binary reality. Meanwhile, a kilobyte (kB) remains 1000 bytes (as per the metric system).

Binary patterns can be used to represent any data, be it text, sound, images or video. The number of possible bit combinations or patterns available increase with when the numbers of bits increase.

Unit Conversion Table

Use this table to help you convert units. Its not a great table, im working on it...

Div Unit Mult
0 Bit 0
0 byte 5
3 kilobyte 2
0 megabytes 0
0 gigabytes 5
3 terabyte 2
3 petabyte 2

Lesson Notes

7 bit ASCII Table

ASCII (pronounced "az-kee" or "ass-key" if American) stands for the American Standard Code for Information Interchange. It serves as a character encoding standard used for electronic communication between computers, telecommunications equipment, and other devices. Here are some key points about ASCII:

  1. Character Encoding: ASCII assigns standard numeric values to letters, numerals, punctuation marks, and other characters commonly used in computers. Each character is represented by a unique numerical code.
  2. 128 Values: Initially, ASCII had only 128 code values, of which only 95 are printable characters. These include digits (0 to 9), lowercase letters (a to z), uppercase letters (A to Z), and punctuation symbols. The remaining 33 codes were non-printing control characters, such as carriage return and line feed.
  3. Binary Representation: ASCII encodes characters into seven-bit integers. For instance, the lowercase letter "i" is represented by binary 1101001 (hexadecimal 69 or decimal 105).
  4. Evolution and Scope: While modern computer systems have transitioned to Unicode (which has millions of code points), the first 128 Unicode code points align with the original ASCII set. ASCII remains a fundamental foundation for character encoding in computing.

Despite being an American standard, ASCII does not include a code point for the cent symbol (¢) or support English terms with diacritical marks (such as résumé and jalapeño) or proper nouns with diacritical marks (such as Beyoncé).

NOTE: Binary values in the table are incorrect. Will fix it later when I have some time.

Binary Dec Hex Char Binary Dec Hex Char Binary Dec Hex Char
0100000 32 20 1000001 64 40 @ 1100001 96 60 `
0100001 33 21 ! 1000010 65 41 A 1100010 97 61 a
0100010 34 22 " 1000011 66 42 B 1100011 98 62 b
0100011 35 23 # 1000100 67 43 C 1100100 99 63 c
0100100 36 24 $ 1000101 68 44 D 1100101 100 64 d
0100101 37 25 % 1000110 69 45 E 1100110 101 65 e
0100110 38 26 & 1000111 70 46 F 1100111 102 66 f
0100111 39 27 ' 1001000 71 47 G 1101000 103 67 g
0101000 40 28 ( 1001001 72 48 H 1101001 104 68 h
0101001 41 29 ) 1001010 73 49 I 1101010 105 69 i
0101010 42 2A * 1001011 74 4A J 1101011 106 6A j
0101011 43 2B + 1001100 75 4B K 1101100 107 6B k
0101100 44 2C , 1001101 76 4C L 1101101 108 6C l
0101101 45 2D - 1001110 77 4D M 1101110 109 6D m
0101110 46 2E . 1001111 78 4E N 1101111 110 6E n
0101111 47 2F / 1010000 79 4F O 1110000 111 6F o
0110000 48 30 0 1010001 80 50 P 1110001 112 70 p
0110001 49 31 1 1010010 81 51 Q 1110010 113 71 q
0110010 50 32 2 1010011 82 52 R 1110011 114 72 r
0110011 51 33 3 1010100 83 53 S 1110100 115 73 s
0110100 52 34 4 1010101 84 54 T 1110101 116 74 t
0110101 53 35 5 1010110 85 55 U 1110110 117 75 u
0110110 54 36 6 1010111 86 56 V 1110111 118 76 v
0110111 55 37 7 1011000 87 57 W 1111000 119 77 w
0111000 56 38 8 1011001 88 58 X 1111001 120 78 x
0111001 57 39 9 1011010 89 59 Y 1111010 121 79 y
0111010 58 3A : 1011011 90 5A Z 1111011 122 7A z
0111100 59 3B ; 1011100 91 5B [ 1111100 123 7B {
0111101 60 3C < 1011101 92 5C \ 1111101 124 7C |
0111110 61 3D = 1011110 93 5D ] 1111110 125 7D }
0111111 62 3E > 1011111 94 5E ^ 1111111 126 7E ~
1000000 63 3F ? 1100000 95 5F _ 1111111 127 7F DEL

8 bit ASCII

8-bit ASCII, also known as Extended ASCII, builds upon the original American Standard Code for Information Interchange (ASCII) system. To enhance its foundational capabilities, 8-bit ASCII includes 8 binary digits (or bits) for each character.

ASCII represents characters using 7 bits (128 code points). However, 8-bit ASCII extends this to 256 characters by utilizing 8 bits per character.
The additional bit allows for a broader range of characters, including special symbols, accented letters, and other language-specific characters.

In summary, 8-bit ASCII enhances the original character encoding by allowing more characters and symbols, making it versatile for different contexts.

A Spooky Ghost

        _,.--.
      .'      `-.
     /   O O   \
    |          /
    |         /
    |        /
     \      /
      `.__.'
    

A Cat

        /\_/\
       ( o.o )
      > ^ <
     /  ---  \
    /         \
   /           \
  

????

        /\
        /  \
       / o o \
      /   ^   \
     /         \
    /_/-\___\_\
    

An Apple

        ,--./,-.
        / #      \
       |          |
        \        / 
         `._,._,'
    

Unicode

Unicode, formally known as The Unicode Standard, is a text encoding standard maintained by the Unicode Consortium. Its purpose is to support the use of text written in all of the world's major writing systems.

Unicode assigns a unique number to every character, regardless of the platform, program, or language. Before Unicode, various character encodings existed, each with limitations. These early encoding methods could not cover all languages and often conflicted with one another. Unicode changed this by providing a consistent way to represent characters across different languages.

Unicode uses 16 bits to represent characters.

Here are examples characters in the Unicode Character Set:

  • こんにちは (Japanese)
  • 厦灣 (Chinese)
  • 한국 (Korean)
  • ćōčīūīū (Hawaiian)
  • العربية (Arabic)
  • Hello World (English)
  • سلام الليكم (Urdu)
  • বাইলার বালার (Bengali)
  • हमात वारीन्र (Hindi)
  • Γεια σου (Greek)

How to work out file size of text...

number of characters x bits used to represent each character = file size in bits

Let's work out the file size of the phrase: Social Distancing


3